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REAL PARTY IN INTEREST 

The real party in interest in the present Application is International Business Machines 
Corporation, the Assignee of the present application as evidenced by the Assignment set forth at 
reel 014612, frame 0595. 

RELATED APPEALS AND INTERFERENCES 

There are no other appeals or interferences known to Appellants, the Appellants' legal 
representative, or assignee, which directly affect or would be directly affected by or have a 
bearing on the Board's decision in the pending appeal. 

STATUS OF CLAIMS 

Claims 15-17 and 21-24 stand finally rejected by the Examiner as noted in the Final 
Office Action dated July 30, 2007. The rejection of Claims 15-17 and 21-24 under 35 U.S.C. § 
103(a) is appealed. 

STATUS OF AMENDMENTS 

No amendments to the claims have been made subsequent to the July 30, 2007 Final 
Office Action fi-om which this Appeal is filed. 

SUMMARY OF THE CLAIMED SUBJECT MATTER 

As described in one embodiment in exemplary Claims 15 and 21, a method and 
computer-readable medium (supported on page 9, lines 3-12 of the originally filed specification) 
comprises: 

reading an HTML document of a web page as an analyzing object (supported on page 22, 
lines 5-6); 

conducting a temporary block analysis based on a description of HTML tags of the 
HTML document (supported on page 22, lines 6-7); 

using the HTML tags to temporarily divide the HTML document into blocks (supported 
on page 22, lines 10-11); 

identifying unnecessary information elements in the HTML document (supported on page 
24, Unes 4-5), wherein the unnecessary information elements include: 
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plural information elements that include an OBJECT_IMAGE having a same 
Uniform Resource Locator (URL), wherein the OBJECT_IMAGE describes a 
type of media used to display the HTML document (supported on page 20, line 1 1 
and page 24, lines 7-9), 

a block of text in the HTML document that is shorter than a maximum 
predeteraiined length, and wherein the block of text appears in the HTML 
document more than a predetermined frequency (supported on page 25, lines 12- 
16), 

multiple anchors having a same title (supported on page 24, lines 7-8), 

image tags that only perform a role of punctuation for text in the HTML 

document (supported on page 24, line 10), and 

multiple text blocks having a same description (supported on page 24, lines 8-9); 
defining any block in the HTML document that is deemed to be meaningless as an 
OBJECT_DELIMITER, wherein a block is deemed to be meaningless if that block contains only 
said unnecessary information elements and at least one anchor (supported on page 24, lines 18- 
20); and 

crawling only anchors found in blocks that have not been defined as 
OBJECT_DELIMITERs (supported on page 26, lines 9-10). 

As described in Claims 18 and 22, in one embodiment a block of text is deemed to 
contain unnecessary information if that block of text is less than 12 bytes, as supported on page 
25, line 14 of the originally filed specification. 

As described in Claims 19 and 23, in one embodiment a short block of text is deemed 
insignificant if it occurs more than ten times, as supported on page 25, line 18 of the originally 
filed specification. 

As described in Claim 24, in one embodiment the method claimed comprises: 
dividing an HTML document into blocks (supported on page 22, lines 10-11 of the 
originally filed specification); 

identifying unnecessary information elements in the HTML document (supported on page 



JP920020109US1 - Appeal Brief 



-3- 



Serial No. 10/621,474 



24, lines 4-5), wherein the unnecessary information elements include: 

a block of text in the HTML document that is shorter than a maximum 
predetermined length, and wherein the block of text appears in the HTML 
document more than a predetermined frequency (supported on page 25, lines 12- 
16), 

multiple anchors having a same title (supported on page 24, lines 7-8), 

image tags that only perform a role of punctuation for text in the HTML 

document (supported on page 24, line 10), and 

multiple text blocks having a same description (supported on page 24, lines 8-9); 

defining any block in the HTML document that is deemed to be meaningless, wherein a 
block is deemed to be meaningless if that block contains only the uimecessary information 
elements and at least one anchor (supported on page 24, lines 18-20); and 

crawling only anchors found in blocks that have not been deemed meaningless due to 
containing only the unnecessary information elements (supported on page 26, lines 9-10). 

GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

A. The Examiner's rejection of Claims 15 and 21 under 35 USC 103(a) as being 
unpatentable over Mantha, et al. (U.S. Patent No. 6,163,779 - "Mantha"") in view of 
Wyler (U.S. Patent No. 7,047,033 - ''WyW) is to be reviewed on Appeal. 

B. The Examiner's rejection of Claims 16 and 22 imder 35 USC 103(a) as being 
unpatentable over Manthciy et al. (U.S. Patent No. 6,163,779 — ^^ManthcC^^ in view of 
Wyler (U.S. Patent No. 7,047,033 - "Wyler"') is to be reviewed on Appeal. 

C. The Examiner's rejection of Claims 17 and 23 under 35 USC 103(a) as being 
unpatentable over Mantha, et al (U.S. Patent No. 6,163,779 - ''Mantha'') in view of 
Wyler (U.S. Patent No. 7,047,033 - 'Wyler'') is to be reviewed on Appeal. 
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D. The Examiner's rejection of Claim 24 under 35 USC 103(a) as being unpatentable over 
Mantha, et al (U.S. Patent No. 6,163,779 - "Mantha'') in view of Wyler (U.S. Patent No. 
7,047,033 - 'Wyler'') is to be reviewed on Appeal. 

ARGUMENTS 

A. The Examiner's rejection of Claims 15 and 21 under 35 USC 103(a) as being 
unpatentable over Mantha, et al (U.S. Patent No. 6,163,779 - "Mantha^^) in view of 
Wyler (U.S. Patent No. 7,047,033 - "Wyler'') is to be reviewed on Appeal. 

1. The Examiner's rejection of Claim 15 and 21 is improper since the cited prior art 
does not teach or suggest all of the claimed limitations of the present invention. 

A combination of the cited art does not teach or suggest "identifying unnecessary 
information elements in the HTML document, wherein the unnecessary information elements 
include. . . a block of text in the HTML document that is shorter than a maximum predetermined 
length, and wherein the block of text appears in the HTML document more than a predetermined 
frequency .'' That is, unnecessary information includes multiple repetitions of a same short block 
of text. 

The Examiner cites Wyler as teaching these limitations at col. 31, line 65 to col 32, line 
7, and col. 32, lines 47-52, which state: 

The "logical location" of an object which is interiorly disposed relative to 

the base object is the maximum value e.g. 100. The "logical location" of any 
other object is the distance, on the webpage, of that object from the base object." 

"Classifying one or more objects as cardinal: As described, a base object 
is selected which is the largest object on the webpage. If there is a tie, i.e. if the 
largest two or more objects are similar, to a predetermined extent, in size, then the 
object with the most words in it is typically deemed to be the base object." 

These passages teach that the location of an object can be described as a distance to the 
largest object on a webpage. Appellants respectfully traverse the Examiner's position that this is 
equivalent to multiple short blocks of text being located on an HTML document too many times. 
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The Examiner also cites column 32, lines 53-57 of Wyler for teaching the feature of 
determining that a block of text is unnecessary if it is sufficiently short. This passage states: 

Preferably, if the base object is not very big, e.g. falls below a tlireshold 
defining the minimum size for a base object, then objects adjacent to the base 
object are combined with the base object to generate a "cardinal" of adequate size. 

As defined in column 32, lines 29-34 of Wyler, a "cardinal" is an object that pertains to a 
main subject of the webpage. Thus, a small base object is combined with other objects to 
generate the cardinal object. In this context, it is clear that this small object is significant, since it 
must be used as part of the composition of the cardinal object. Therefore, this passage teaches 
away from the claimed feature of identifying unnecessary information elements in the HTML 
document. 

The Examiner also cites column 15, lines 18-19 of Wyler for teaching that high-fi:equency 
objects are deemed to be irrelevant ("wherein the block of text appears in the HTML document 
more than a predetermined fi-equency"). The cited passage in Wyler is: 

"3. Occurrence - the number of alphanumeric strings within a text object or table." 

There is nothing in this passage to suggest that high-firequency objects are irrelevant, as presently 
claimed. Rather, the passage merely states that objects can be counted (with no suggestion of 
how this information may be relevant in determining what information is unnecessary). 

Furthermore, a combination of the cited art does not teach or suggest "a block is deemed 
to be meaningless if that block contains only said unnecessary information elements and at least 
one anchor." The Examiner responds that Wyler teaches this feature at col. 12, lines 4-8, which 
states: 

In this level the application removes irrelevant information (images and 
data i.e. advertising banners, links to unrelated issues) firom the webpage, and 
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reorganizes the information into objects with categories in a file represent by the 
M20 script language. 

In this passage, Wyler teaches that images such as advertising banners and links to other 
irrelevant pages can be purged from a webpage. However, there is no teaching or suggestion of 
"multiple short block of text being repeated" (as discussed above), "multiple anchors having a 
same title," "image tags that only perform a role of punctuation," and "text block having a same 
description" as being requisite components for defining a meaningless block in an HTML 
document. Similarly, there is no teaching or suggestion that this meaningless block in the 
HTML document has *'at least one anchor." 

Furthermore, a combination of the cited art does not teach or suggest "crawling only 
anchors found in blocks that have not been defined as OBJECT_DELIMETERS" (i.e., only 
crawling anchors that are not meaningless). The Examiner cites col. 14, hnes 39-44 of Wyler for 
this teaching. This passage states: 

Second, the application searches all the documents for words that fit into 
the Index category, and when finding such words, the appUcation inserts an 
"index" command. From that point on, the webpages are called "documents" since 
they have no longer have properties of a webpage. 

This passage states that a crawler ("application") searches all documents for key words 
("that fit into the Index category"). There is no teaching or suggestion of crawling only anchors 
that are in blocks that have not been deemed to have imnecessary information elements (e.g., do 
not contain "plural information elements that include an OBJECT_IMAGE having a same 
Uniform Resource Locator (URL), wherein the OBJECT_IMAGE describes a type of media 
used to display the HTML document, a block of text in the HTML document that is shorter than 
a maximum predetermined length, and wherein the block of text appears in the HTML document 
more than a predetermined frequency, multiple anchors having a same title, image tags that only 
perform a role of punctuation for text in the HTML document, and multiple text blocks having a 
same description.") 

The Examiner also cites column 14, lines 15-25 of Wyler, which states: 
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The insertion of the M20 script begins with scanning the entire webpage and 
parsing the contents into words related to the webpage commands and words 
related to the user-relevant information. Actually, it is a process of taking the 
additional information off the text itself Some of the commands that are found 
may be relevant for formatting a document/webpage in a book-style 
format/webpage for devices with screen size and browser limitations. Some may 
be irrelevant (e.g. remarks, search engine keywords, etc.). The relevant commands 
that are found are translated into M20 script language. 

The cited passage is irrelevant to the feature of "crawling only anchors found in blocks 
that have not been defined as OBJECT DELIMITERS " in which an OBJECT DELIMITER 
describes a block that only contains unnecessary information elements. That is, the cited passage 
only states that relevant commands may be translated into M20 script language (Markup to 
Object script language). This script conversion is unrelated to crawling, particularly to crawling 
anchors. 

2. The Examiner's rejection of Claims 15 and 21 is improper since there is no 
motivation to combine features that may be taught in the cited prior art. 

Even if the cited art were to be construed as teaching or suggesting all of the features 
found in the Claims 15 and 21, there is still no motivation provided in the cited art or in other 
known art to combine these features. 

The proper rationales for arriving at a conclusion of obviousness, as suggested by the 
U.S. Supreme court in the case of KSR International Co. v. Teleflex, Inc. et ah . 127 S. Ct. 1727 
(2007), include the following tests for determining a motivation to combined elements from the 
prior art: 

A. Combining prior art elements according to known methods to yield predictable results; 

B. Simple substitution of one known element for another to obtain predictable results; 

C. Use of a known technique to improve similar devices in a the same way; 

D. Applying a known technique to a known device readv for improvement to yield 
predictable results; 
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E. "Obvious to try" - choosing from a finite number of identified, predictable solutions, 
with a reasonable expectation of success; 

F. Some teaching, suggestion, or motivation in the prior art that would have led one of 
ordinary skill to modify the prior art reference or to combine prior art reference teachings to 
anive at the claimed invention. (All emphasis added.) 

The Examiner does not follow any of the KSR motivation to combine rationales; rather, 
the Examiner simply states that it "would have been obvious... to incoiporate Wyler teachings 
into Mantha system... as suggested by Wyler [column 12, lines 1-2], in order to recognize 
irrelevant data." The cited passage from Wyler is: 

"The Second level: Parsing, Analyzing and Converting (into M20 Script Language) the 
Content" 

Nothing in the cited passage (which is merely a passage heading) suggests combining the 
cited teachings to recognize irrelevant data, particularly in light of the guidance provided by 
KSR, 

For reasons cited above, the rejection of Claims 15 and 21 is improper, and should be 
reversed. 

B. The Examiner's rejection of Claims 16 and 22 under 35 USC 103(a) as being 
unpatentable over Mantha^ et al (U.S. Patent No. 6,163,779 - "Mantha'^) in view of 
Wyler (U.S. Patent No. 7,047,033 - "Wylef) is to be reviewed on Appeal. 

The Examiner's rejection of Claim 16 and 22 is improper since there is no 
motivation to combine features that may be taught in the cited prior art. 

Even if the cited art were to be construed as teaching or suggesting all of the features 
found in Claims 16 and 22, including the feature of establishing that text contains unnecessary 
information if it is shorter than "12 bytes," there is still no motivation to combine these features. 
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The proper rationales for arriving at a conclusion of obviousness utilized by KSR are 
described above, and are not reiterated here. The Examiner does not follow any of the KSR 
motivation to combine rationales; rather, the Examiner gives no rationale at all. Appellants 
believe that this silence is due to a prima facie lack of such motivation. 

For reason stated, the rejection of Claim 16 and 22 is improper, and should be reversed. 

C. The Examiner's rejection of Claims 17 and 23 under 35 USC 103(a) as being 
unpatentable over Mantha, et al (U.S. Patent No. 6,163,779 - "Mantha^^) in view of 
Wyler (U.S. Patent No. 7,047,033 - "Wyler'') is to be reviewed on Appeal. 

The Examiner's rejection of Claim 17 and 23 is improper since there is no 
motivation to combine features that may be taught in the cited prior art. 

Even if the cited art were to be construed as teaching or suggesting all of the features 
found in Claims 17 and 23, including the feature of estabUshing that short blocks of text contains 
unnecessary information if they occur more than "ten times," there is still no motivation to 
combine these features. 

The proper rationales for arriving at a conclusion of obviousness utilized by KSR axe 
described above, and are not reiterated here. The Examiner does not follow any of the KSR 
motivation to combine rationales; rather, the Examiner gives no rationale at all. Appellants 
believe that this silence is due to a prima facie lack of such motivation. 

For reason stated, the rejection of Claim 17 and 23 is improper, and should be reversed. 

D. The Examiner's rejection of Claim 24 under 35 USC 103(a) as being unpatentable 
over Mantha, et al. (U.S. Patent No. 6,163,779 - "Mantha'') in view of Wyler (U.S. 
Patent No. 7,047,033 - "Wyler'') is to be reviewed on Appeal. 
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1. The Examiner's rejection of Claim 24 is improper since the cited prior art does 
not teach or suggest all of the claimed limitations of the present invention. 

A combination of the cited art does not teach or suggest "identifying unnecessary 
information elements in the HTML document, wherein the unnecessary information elements 
include. . . a block of text in the HTML document that is shorter than a maximum predetermined 
length, and wherein the block of text appears in the HTML document more than a predetermined 
frequency ." That is, unnecessary information includes multiple repetitions of a same short block 
of text. 

The Examiner cites Wyler as teaching these limitations at col. 31, line 65 to col. 32, line 
7, and col. 32, lines 47-52, which state: 

The "logical location" of an object which is interiorly disposed relative to 
the base object is the maximum value e.g. 100. The "logical location" of any 
other object is the distance, on the webpage, of that object from the base object." 

"Classifying one or more objects as cardinal: As described, a base object 
is selected which is the largest object on the webpage. If there is a tie, i.e. if the 
largest two or more objects are similar, to a predetermined extent, in size, then the 
object with the most words in it is typically deemed to be the base object." 

These passages teach that the location of an object can be described as a distance to the 
largest object on a webpage. Appellants respectfully traverse the Examiner's position that this is 
equivalent to multiple short blocks of text being located on an HTML document too many times. 

The Examiner also cites coIuitlq 32, lines 53-57 of Wyler for teaching the feature of 
determining that a block of text is unnecessary if it is sufficiently short. This passage states: 

Preferably, if the base object is not very big, e.g. falls below a threshold 
defining the minimum size for a base object, then objects adjacent to the base 
object are combined with the base object to generate a "cardinal" of adequate size. 

As defined in column 32, lines 29-34 of Wyler, a "cardinal" is an object that pertains to a 
main subject of the webpage. Thus, a small base object is combined with other objects to 
generate the cardinal object. In this context, it is clear that this small object is significant, since it 
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must be used as part of the composition of the cardinal object. Therefore, this passage teaches 
away from the claimed feature of identifying unnecessary information elements in the HTML 
document. 



The Examiner also cites cohimn 15, lines 18-19 of Wyler for teaching that high-frequency 
objects are deemed to be iixelevant ("wherein the block of text appears in the HTML document 
more than a predetermined frequency"). The cited passage in Wyler is: 

"3. OccuiTence - the number of alphanumeric strings within a text object or table." 

There is nothing in this passage to suggest that high-frequency objects are iixelevant, as presently 
claimed. Rather, the passage merely states that objects can be counted (with no suggestion of 
how this information may be relevant in determining what information is unnecessary). 

Furthermore, a combination of the cited art does not teach or suggest "crawling only 
anchors foxmd in blocks that have not been deemed meaningless due to containing only the 
unnecessary information elements." The Examiner cites col. 14, lines 39-44 of Wyler for this 
teaching. This passage states: 

Second, the application searches all the documents for words that fit into 
the Index category, and when finding such words, the application inserts an 
"index" command. From that point on, the webpages are called "documents" since 
they have no longer have properties of a webpage. 

This passage states that a crawler ("application") searches all documents for key words 
("that fit into the Index category"). There is no teaching or suggestion of crawling only anchors 
that are in blocks that have not been deemed to have unnecessary information elements. 

The Examiner also cites column 14, lines 15-25 of Wyler, which states: 

The insertion of the M20 script begins with scanning the entire webpage and 
parsing the contents into words related to the webpage commands and words 
related to the user-relevant information. Actually, it is a process of taking the 
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additional inforaiation off the text itself Some of the commands that are foimd 
may be relevant for fomiatting a document/webpage in a book-style 
format/webpage for devices with screen size and browser limitations. Some may 
be irrelevant (e.g. remarks, search engine keywords, etc.). The relevant commands 
that are found are translated into M20 script language. 

The cited passage is irrelevant to the feature of "crawling only anchors found in blocks 
that have not been deemed meaningless due to containing only the unnecessary infonnation 
elements" (i.e., where the unnecessary information elements include "a block of text in the 
HTML document that is shorter than a maximum predetermined length, and wherein the block of 
text appears in the HTML document more than a predetermined jfrequency, multiple anchors 
having a same title, image tags that only perform a role of punctuation for text in the HTML 
document, and multiple text blocks having a same description"). That is, the cited passage only 
states that relevant commands may be translated into M20 script language (Markup to Object 
script language). This script conversion is unrelated to crawling, particularly to crawling 
anchors. 

2. The Examiner's rejection of Claim 24 is improper since there is no motivation to 
combine features that may be taught in the cited prior art. 

Even if the cited art were to be construed as teaching or suggesting all of the features 
found in the Claims 24, there is still no motivation provided in the cited art or in other known art 
to combine these features. 

The proper rationales for arriving at a conclusion of obviousness, as suggested by the 
U.S. Supreme court in the case of KSR Litemational Co. v. Teleflex, Inc. et aL . 127 S. Ct. 1727 
(2007), include the following tests for determining a motivation to combined elements from the 
prior art: 

A. Combining prior art elements according to known methods to yield predictable results: 

B. Simple substitution of one known element for another to obtain predictable results; 

C. Use of a known technique to improve similar devices in a the same way: 

D. Applying a known technique to a known device ready for improvement to yield 
predictable results; 
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E. "Obvious to try" - choosing from a finite number of identified, predictable solutions, 
with a reasonable expectation of success; 

F. Some teaching, suggestion, or motivation in the prior art that would have led one of 
ordinary skill to modify the prior art reference or to combine prior art reference teachings to 
aiTive at the claimed invention. (All emphasis added.) 

The Examiner does not follow any of the KSR motivation to combine rationales; rather, 
the Examiner simply states that it "would have been obvious... to incorporate Wyler teachings 
into Mantha system... as suggested by Wyler [column 12, lines 1-2], in order to recognize 
irrelevant data." The cited passage from Wyler is: 

"The Second level: Parsing, Analyzing and Converting (into M20 Script Language) the 
Content" 

Nothing in the cited passage (which is merely a passage heading) suggests combining the 
cited teacliings to recognize irrelevant data, particularly in light of the guidance provided by 
KSR. 

For reasons cited above, the rejection of Claim 24 is improper, and should be reversed. 
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CONCLUSION 

Appellants have pointed out with specificity the manifest error in the Examiner's 
rejections, and the claim language which renders the invention patentable over the various 

combinations of references. Appellants, therefore, respectfully request that this case be 
remanded to the Examiner with instructions to issue a Notice of Allowance for all pending 
claims. 



Respectfully submitted, 




Reg. No. 44,545 
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512-343-6116 

ATTORNEY FOR APPELLANTS 
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CLAIMS APPENDIX 



1-14. (canceled) 

15. A method comprising: 

reading an HTML document of a web page as an analyzing object; 
conducting a temporary block analysis based on a description of HTML tags of the 
HTML document; 

using the HTML tags to temporarily divide the HTML document into blocks; 

identifying unnecessary information elements in the HTML document, wherein the 
unnecessary information elements include: 

plural information elements that include an OBJECT_IMAGE having a same 
Uniform Resource Locator (URL), wherein the OBJECT_IMAGE describes a 
type of media used to display the HTML document, 

a block of text in the HTML document that is shorter than a maximum 
predetermined length, and wherein the block of text appears in the HTML 
document more than a predetermined frequency, 

multiple anchors having a same title, 

image tags that only perform a role of punctuation for text in the HTML 
document, and 

multiple text blocks having a same description; 
defining any block in the HTML document that is deemed to be meaningless as an 
OBJECT_DELIMITER, wherein a block is deemed to be meaningless if that block contains only 
said unnecessary information elements and at least one anchor; and 

crawling only anchors found in blocks that have not been defined as 
OBJECT_DELIMITERs. 

16. The method of claim 15, wherein the maximum predetermined length is 12 bytes. 

17. The method of claim 16, wherein the predetermined jfrequency is ten times. 
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18-20. (canceled) 



21. A computer-readable medium encoded with a computer program, wherein the computer 
program, when executed, performs the steps of: 

reading an HTML document of a web page as an analyzing object; 

conducting a temporary block analysis based on a description of HTML tags of the 
HTML document; 

using the HTML tags to temporarily divide the HTML document into blocks; 

identifying unnecessary information elements in the HTML document, wherein the 
unnecessary information elements include: 

plural information elements that include an OBJECT_IMAGE having a same 
Unifomi Resource Locator (URL), wherein the OBJECT_IMAGE describes a 
type of media used to display the HTML document, 

a block of text in the HTML document that is shorter than a maximum 
predetermined length, and wherein the block of text appears in the HTML 
document more than a predetermined frequency, 
multiple anchors having a same title, 

image tags that perform a role of punctuation for text in the HTML document, and 
multiple text blocks having a same description; 
defining any block in the HTML document that is deemed to be meaningless as an 
OBJECT_DEL]MITER, wherein a block is deemed to be meaningless if that block contains only 
said unnecessary information elements; and 

crawling only anchors found in blocks that have not been defined as 
OBJECT_DELIMITERs. 

22. The computer-readable medium of claim 21, wherein the maximum predetermined length is 
12 bytes. 

23. The computer-readable medium of claim 21, wherein the predetermined frequency is ten 
times. 
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24. A method comprising: 

dividing an HTML document into blocks; 

identifying unnecessary information elements in the HTML document, wherein the 
umiecessary information elements include: 

a block of text in the HTML document that is shorter than a maximum 
predetermined length, and wherein the block of text appears in the HTML 
document more than a predetermined firequency, 
multiple anchors having a same title, 

image tags that only perform a role of punctuation for text in the HTML 
document, and 

multiple text blocks having a same description; 

defining any block in the HTML document that is deemed to be meaningless, wherein a 
block is deemed to be meaningless if that block contains only the unnecessary information 
elements and at least one anchor; and 

crawling only anchors found in blocks that have not been deemed meaningless due to 
containing only the unnecessary information elements. 



JP920020109US1 - Appeal Brief 



-18- 



Serial No. 10/621,474 



EVIDENCE APPENDIX 

Other than the Office Action(s) and reply(ies) already of record, no additional evidence 
has been entered by Appellants or the Examiner in the above-identified application which is 
relevant to this appeal. 
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RELATED PROCEEDINGS APPENDIX 

There are no related proceedings as described by 37 C.F.R. §41.37(c)(l)(x) known to 
Appellants, Appellants' legal representative, or assignee. 
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